Search CORE

335 research outputs found

Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries

Author: Steindorfer Michael J.
Vinju Jurgen J.
Publication venue
Publication date: 02/08/2016
Field of study

An immutable multi-map is a many-to-many thread-friendly map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in static analysis of object-oriented systems. When processing such big data sets the memory overhead of the data structure encoding itself is a memory usage bottleneck. Motivated by reuse and type-safety, libraries for Java, Scala and Clojure typically implement immutable multi-maps by nesting sets as the values with the keys of a trie map. Like this, based on our measurements the expected byte overhead for a sparse multi-map per stored entry adds up to around 65B, which renders it unfeasible to compute with effectively on the JVM. In this paper we propose a general framework for Hash-Array Mapped Tries on the JVM which can store type-heterogeneous keys and values: a Heterogeneous Hash-Array Mapped Trie (HHAMT). Among other applications, this allows for a highly efficient multi-map encoding by (a) not reserving space for empty value sets and (b) inlining the values of singleton sets while maintaining a (c) type-safe API. We detail the necessary encoding and optimizations to mitigate the overhead of storing and retrieving heterogeneous data in a hash-trie. Furthermore, we evaluate HHAMT specifically for the application to multi-maps, comparing them to state-of-the-art encodings of multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a real world case in static analysis. The new encoding brings the per key-value storage overhead down to 30B: a 2x improvement. With additional inlining of primitive values it reaches a 4x improvement

arXiv.org e-Print Archive

CWI's Institutional Repository

Software Engineering: The War Against Complexity

Author: Vinju J.J. (Jurgen)
Publication venue
Publication date: 01/01/2015
Field of study

CWI's Institutional Repository

SDF Disambiguation Medkit for Programming Languages

Author: Vinju J.J. (Jurgen)
Publication venue: CWI
Publication date: 01/04/2011
Field of study

CWI's Institutional Repository

Challenges and Opportunities of Big Software-based Innovation

Author: Vinju J.J. (Jurgen)
Publication venue
Publication date: 01/01/2015
Field of study

CWI's Institutional Repository

M3: An Open Model for Measuring Code Artifacts

Author: Izmaylova Anastasia
Klint Paul
Shahi Ashim
Vinju Jurgen
Publication venue
Publication date: 01/01/2013
Field of study

This document details design considerations of M3: a meta model for source code artifact

arXiv.org e-Print Archive

Repository TU/e

CWI's Institutional Repository

INRIA a CCSD electronic archive server

How to make a bridge between transformation and analysis technologies?

Author: Cordy James R.
Vinju Jurgen
Publication venue: Dagstuhl Seminar Proceedings. 05161 - Transformation Techniques in Software Engineering
Publication date: 01/01/2006
Field of study

At the Dagstuhl seminar on Transformation Techniques in Software Engineering we had an organized discussion on the intricacies of engineering practicle connections between software analysis and software transformation tools. This abstract summarizes it. This discussion contributes mainly by explicitly focussing on this subject from a general perspective, and providing a first sketch of a domain analysis. First we discuss the solution space in general, and then we compare the merits of two entirely diÂ®erent designs: the monolithic versus the heterogeneous approach

Dagstuhl Research Online Publication Server

Looking Towards a Future where Software is Controlled by the Public (and not the other way round)

Author: Bruntink M. (Magiel)
Vinju J.J. (Jurgen)
Publication venue: ERCIM
Publication date: 01/10/2014
Field of study

Nowadays, software has a ubiquitous presence in everyday life and this phenomenon gives rise to a range of challenges that affect both individuals and society as a whole. In this article we argue that in the future, the domain of software should no longer belong to technical experts and system integrators alone. Instead it should transition to a firmly engaged public domain, similar to city planning, social welfare and security. The challenge that lies at the heart of this problem is the ability to understand, on a technical level, what all the different software actually is and what it does with our information

CWI's Institutional Repository

Code Specialization for Memory Efficient Hash Tries

Author: Steindorfer M.J. (Michael)
Vinju J.J. (Jurgen)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/09/2014
Field of study

The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We generate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective

CWI's Institutional Repository

Parse Forest Diagnostics with Dr. Ambiguity

Author: Basten H.J.S. (Bas)
Vinju J.J. (Jurgen)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2011
Field of study

In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. % an output of a static ambiguity detection tool that has detected ambiguity in a context-free grammar or of a general parser that has accidentally parsed an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity. We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce \drambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5

CWI's Institutional Repository

Faster ambiguity detection by grammar filtering

Author: Basten H.J.S. (Bas)
Vinju J.J. (Jurgen)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2010
Field of study

Real programming languages are often defined using ambiguous context-free grammars. Some ambiguity is intentional while other ambiguity is accidental. A good grammar development environment should therefore contain a static ambiguity checker to help the grammar engineer. Ambiguity of context-free grammars is an undecidable property. Nevertheless, various imperfect ambiguity checkers exist. Exhaustive methods are accurate, but suffer from non-termination. Termination is guaranteed by approximative methods, at the expense of accuracy. In this paper we combine an approximative method with an exhaustive method. We present an extension to the Noncanonical Unambiguity Test that identifies production rules that do not contribute to the ambiguity of a grammar and show how this information can be used to significantly reduce the search space of exhaustive methods. Our experimental evaluation on a number of real world grammars shows orders of magnitude gains in efficiency in some cases and negligible losses of efficiency in others

CWI's Institutional Repository